A principal component analysis for LISA — the TDI connection 
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Data from the Laser Interferometer Space Antenna (LISA) is expected to be dominated by fre- 
quency noise from its lasers. However the noise from any one laser appears more than once in the 
data and there are combinations of the data that are insensitive to this noise. These combinations, 
called time delay mterferometry (TDI) variables, have received careful study, and point the way to 
how LISA data analysis may be performed. Here we approach the problem from the direction of 
statistical inference, and show that these variables are a direct consequence of a principal compo- 
nent analysis of the problem. We present a formal analysis for a simple LISA model and show that 
there are eigenvectors of the noise covariance matrix that do not depend on laser frequency noise. 
Importantly, these orthogonal basis vectors correspond to linear combinations of TDI variables. As 
a result we show that the likelihood function for source parameters using LISA data can be based 
on TDI combinations of the data without loss of information. 

PACS numbers: 02.50.Tt, 07.05.Kf, 95.55.Ym, 95.75.Kk, 95.75.Wx, 95.85.Sz 



I. INTRODUCTION 



The Laser Interferometer Space Antenna (LISA) is a 
space-borne gravitational telescope currently under de- 
velopment by ES A and NASA j5| • It is designed to detect 
gravitational radiation at frequencies between ~ 10^^ Hz 
and ~ 1 Hz, a band that is not readily accessible from 
Earth due to local noise. The current design comprises 
a constellation of three spacecraft in circular orbits of 
radius 1 AU around the Sun, in a near-equilateral config- 
uration of side 5 x 10^ km. Gravitational waves passing 
through the telescope modulate the separation between 
the spacecraft on scales of picometres. This modulation 
is sensed by laser beams exchanged between spacecraft, 
and recorded as the difference in frequency between the 
locally-generated and received laser signals. In simple 
terms there are therefore three independent lasers in the 
system, and six raw signal streams (referred to as Doppler 
measurements in the literature, e.g., 0) corresponding to 
the six bidirectional baseline combinations of these lasers. 

The main source of noise in these raw streams comes 
from the relative frequency stability of the reference 
lasers, giving a spectral density of ~ 10"^'^ Hz~^/^ in the 
millihertz band. This laser frequency noise results in a 
strain noise floor ~ 10'' times higher than the target sen- 
sitivity of LISA and, at first sight, severely limits LISA's 
performance. The scenario is very similar to that encoun- 
tered in radio astronomical very long baseline interferom- 
etry (VLBI) , where free-running local oscillators are used 
at each end of the interferometric baseline, giving the re- 
sultant fringes an unknown and varying fringe rate. The 
breakthrough in radio astronomy came with the realisa- 
tion that with three or more telescopes, and therefore 
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three or more baselines, one can form closure quantities 
that are insensitive to the relative phases of the local os- 
cillators. The simplest of these relations is closure phase, 
comprising the sum of interferometric phases around a 
loop of baselines |0] . The phase of each local oscillator 
appears twice in such a sum and with opposite signs, so 
that closure phase contains only baseline-dependent (and 
therefore astronomical), rather than antenna-dependent 
contributions. This direct method was used in the early 
days of radio interferometry 0, in VLBI and more 
recently in optical interferometry ^ . In modern radio as- 
tronomy these principles are encapsulated in the idea of 
self calibration, in which each antenna in the interferom- 
eter is allocated an unknown complex gain factor !J Il2j | . 
The interferometer data are then used to simultaneously 
determine both these gain factors and the sky map, so 
greatly increasing the performance of the instrument. 

The situation for LISA is more complicated. Laser 
noise can be canceled only if the closure relations take 
account of the light travel time between spacecraft, and 
the presence of bidirectional beams along the baselines 
increases the number of possible relations. This has led 
to the development of Time-Delay Interferometry (TDI) 
variables for LISA jlQ]. These are linear combinations 
of the six LISA data streams, suitably offset in time to 
cancel the noise contributions from the three lasers. TDI 
variables have received detailed attention in the literature 
and there is now a sophisticated understanding of their 
generation and properties 0, IE S 1^ 7 extending 
to 'second-generation' TDI variables that take account of 
the slight relative motion of the spacecraft [TsL IT^ . 

The very existence of TDI variables clearly shows that 
LISA is capable of generating data that is sensitive to as- 
tronomical sources but not to laser noise. This is an im- 
portant step, but it docs not tell us how to use these TDI 
variables to do astronomy. For this we need a method of 
using these derived data to constrain the sky and make 
statements about the parameters of individual sources of 



2 



gravitational radiation. The natural framework for such 
statements is that of statistical inference, in which we 
construct a Bayesian probability for a source parameter, 
or set of parameters, a, given the LISA data, d. Formally 



simplified, LISA model and show that minor principal 
components are TDI variables. In Section IV we discuss 
some of the immediate implications of this work for the 
design and data analysis of LISA. 



p{a\d) cx p{a)p{d\a) . 



(1) 



II. SIMPLE EXAMPLE 



where p{d\a) is the probability of getting a certain set 
of data given a noise model and a particular value for 
a. This is familiar as Bayes' Theorem, and requires the 
prior probability for a, p(a), before it can be fully ap- 
plied. Although priors form an important part of any 
analysis we will not concentrate on them here and they 
do not depend on the current data. The quantity p{d\a) 
is usually called the likelihood of a and, importantly, fully 
defines how the data enters into the calculation. Indeed 
the likelihood contains all that the data has to say on 
the matter, so that the heart of a parameter estimation 
problem is fully defined once a likelihood is written down. 

We will show below that the likelihood function for 
LISA contains several insights into the meaning and role 
of TDI variables. We consider the noise covariance ma- 
trix for a series of LISA Doppler measurements over sev- 
eral time steps. For Gaussian noise, the inverse of this 
covariance matrix defines the log likelihood of the param- 
eters, and its quadratic form defines contours of equal 
likelihood, in a space equal in dimension to the number 
of data points used. The eigenvectors of this covariance 
matrix are the principal axes of the equal-likelihood hy- 
perellipsoidal surface, and correspond to linear combina- 
tions of the data that give maximal and minimal covari- 
ance. Principal component analysis (PCA) is simply the 
process of identifying these eigenvectors and using only a 
subset of them to characterise the data. Usually it is the 
components with the largest eigenvalues that are desired, 
as these are the data combinations that contain the ma- 
jority of the covariance. These components are used in 
fields such as pattern recognition in order to generalise 
or compress data sets. For LISA however, these principal 
components are the ones that contain the highly corre- 
lated laser frequency noise. In contrast therefore, we are 
interested in the existence of eigenvalues that do not de- 
pend on the laser noise and are minimal. We will show 
that the eigenvalues of the LISA covariance matrix fall 
into two distinct groups, distinguished by their depen- 
dence on laser noise. The group that is independent of 
the common frequency noise corresponds directly to the 
TDI variables considered above, in fact the relationship 
is hinted at in |^ . Not all will necessarily be sensitive to 
gravitational wave signals, but as a group they orthogo- 
nally span the data sub-space that corresponds to LISA's 
design noise floor. The other group still contain astron- 
omy, but are dominated by the effects of laser frequency 
fluctuations. 

In Section II we will consider a simple example of PCA 
to highlight the essence of the method and demonstrate 
its applicability to LISA data analysis. In Section III 
we extend these ideas to tackle a real, but somewhat 



Consider a single sample of data from two generic de- 
tectors 



si = p 
S2 = P 



- ni 
n2 



hi , 
h2 , 



(2) 
(3) 



where ni, n2 are uncorrelated noises in the individual 
detectors, p is a common noise term, and hi, h2 are the 
astrophysical signals of interest. For simplicity, we as- 
sume that the noises are Gaussian-distributed with zero 
mean and variances 

{nl)^{nl)^al and {p') ^ , (4) 

and that they are mutually uncorrelated, so that 

(nin2) = (nip) = {n2p) = . (5) 

In addition, we assume a simple model in which the as- 
trophysical signals in the two detectors are 



hi = 2a and /12 



(6) 



where a is a fixed but unknown constant whose posterior 
distribution, p{a\si, S2), we want to compute. 

To do this, in accordance with Eq. (Q, we need to 
calculate the likelihood p{si, S2\a). For Gaussian noise, 



p{si,S2\a) (X exp 



where 



(s - hf ■ C-^ • (s - h) 

2 

[si - hi)C^'^ij{sj - h-j) 



(7) 

(8) 
(9) 



is a quadratic form, involving the inverse of the noise 
covariance matrix C whose elements are 



C^j = {{s, - h,){sj - hj)) 



Using Eqs. H4I5|) . we find 



C 



(10) 



(11) 



Principal component analysis simplifies the calculation 
of the likelihood p(si,S2|a), by identifying the eigenvec- 
tors of this covariance matrix C. Note that the eigen- 
vectors of C are also the eigenvectors of , but with 



3 



reciprocal eigenvalues. The eigenvectors of C in Eq. 
are 



with eigenvalues 
A. 



and e_ = — ■= ( \ 

V2 V-1 



2a.p + (jjj and A_ = 



respectively. If we form the matrix of eigenvectors 
E=(e+ 



(12) 



(13) 



) = -(' ' 



and use the facts that 



E- 



E-C-^ -E^ ^ 



E^ -E^l, 
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1/A- 



= A- 



and 



(14) 

(15) 
(16) 

(17) 
(18) 

(s - h)T . {e'^ . E) ■ C-^ ■ {E'^ ■E)-{s-h) (19) 
{E-{s~h)f -A-^-iE-is-h)) 



V2 \ {si - S2) - [hi - h2) 
we can rewrite Q as 
= (s - h)T . • (s - h) 



2 2(72 + 



(s+ - 3a)^ + (s_ - a)^ , 



(20) 
(21) 



where we have defined s+, s_ to be the the eigencombi- 
nations 



Si + S2 , 
Si - S2 . 



(22) 
(23) 



Thus, the likelihood p(si,S2|a) factorises into a product 
of likelihoods 



where 



p(si,S2|a) (X p{s+\a)p{s-\a) , 
1 (s+ - 3a)2 



p(s+|a) c>c exp 
p(s_|a) cx exp 



2 4^2 + 2^2 

1 (s_ -a)2 - 

2 2^2 



(24) 

(25) 
(26) 



Note that the s_ data combination, Eq. (|23|l . corre- 
sponds to the minimum eigenvalue A_ = (7,2, which is 
independent of the common noise variance Cp jn this 
sense, s_ is the preferred data combination for this ex- 
ample, since it is least affected by the various noise terms 
(see Fig. P). Moreover, if cr^ ^ a^, as is the case for the 
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FIG. 1: Data probability contours referred to (a) the orig- 
inal data axes and (b) axes corresponding to the eigen- 
combinations of the data (with zero covariance). 



common laser frequency noise for LISA, then there is (ef- 
fectively) no loss in information by doing statistical in- 
ference with only the s_ data combination. This is most 
easily seen by writing down the posterior distribution for 
a: 

p(a|si,S2) oc p{si,S2\a)p{a) (27) 
cx p{s+\a)p{s-\a)p{a) (28) 
a p{s.\a)p{a), (29) 

where the last proportionality follows from the fact 
that the Gaussian p{s+\a) is effectively constant over 
the range of a-values for which p(s_|a) is peaked (cf. 
Eqs. (|!^5lij6|) with cr2 ^ ^^2^ rpj^^^ p^^^ j^^^^ simpli- 
fied the analysis of this particular problem by identi- 
fying a combination of the original data (in this case 
s_ = si — S2) that captures nearly all the available infor- 
mation on our parameter a. 



III. LISA EXAMPLE 

As mentioned in Sec. I, the current design of LISA com- 
prises three spacecraft in circular orbits of radius 1 AU 
around the Sun, in a near equilateral configuration of side 
5 X 10^. The tiny (picometre) modulation of the separa- 
tion between the spacecraft produced by the passage of 
gravitational waves is sensed by Doppler measurements 
of laser beams exchanged between spacecraft, recorded as 
the difference in frequency between the locally-generated 
and received laser signals at each spacecraft. Since each 
spacecraft receives laser signals sent from the other two 
spacecraft, there are six raw data streams in total, de- 
noted si, s[, S2, s'2, S3, S3, where the subscript indicates 
the spacecraft receiving the laser beam, and is unprimed 
or primed depending on whether the beam is traveling 
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counter-clockwise or clockwise around the LISA triangle, 
as viewed from above (see Fig.|21l. 
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FIG. 2: Schematic of LISA configuration, following the con- 
ventions in Q. The spacecraft are labeled 1, 2, 3. The sep- 
aration between spacecraft are denoted by Li, L'^, where the 
index i corresponds to the opposite spacecraft. The beam 
arriving at spacecraft i has subscript i and is unprimed or 
primed depending on whether the beam is traveling counter- 
clockwise or clockwise around the LISA triangle, as viewed 
from above. 

Following the notation in 0, we write the six LISA 
data streams as 

si = -pi+ni + hi, (30) 

s'l = 'D2P3 -pi+ n[ + h[ , (31) 

together with their cyclic permutations (1^2^3^1) 
at spacecrafts 2 and 3. Here pi is the frequency noise 
associated with the two lasers (assumed for now to be 
locked to one another) on spacecraft i {i = 1,2,3); Ui 
and n'^ are all other noises associated with the transmis- 
sion of the signal to spacecraft i in the counter-clockwise 
and clockwise directions, and hi and h'^ are the frequency 
modulations produced by the astrophysical signals. Vi is 
a delay operator that takes a data stream x{t) and delays 
it by the light travel time down the arm Lf. 

V,x{t) = x{t - L,) , (32) 

in units where the speed of light c = 1. Explicitly, 

si{t) - P2{t - L^) - pi{t) + ni{t) + hi{t) , (33) 
s'lit) ^ p3{t - L2) - pi{t) + n[{t) + h[{t) , (34) 



and similarly for the data streams at spacecrafts 2 and 3. 
Note that we are restricting ourselves in this example to 
the case of a non-rotating LISA configuration with fixed 
arm-lengths. We assume that the light travel time down 
an arm is independent of the direction in which it is mov- 
ing (counter-clockwise or clockwise) and is independent 
of the time of emission. This corresponds to Li = L[ in 
Fig. El 

In practice, the data streams will be discretely-sampled 
on-board the spacecrafts, with sampling period Ai. For 
simplicity, we assume that the light travel times down 
the arms are related by simple integers — in particular, 

ii^At, L2 = 2At, L3^3At. (35) 

This restriction is made only to minimise the number 
of data points needed to illustrate the PCA method. It 
should be relatively straightforward to extend our anal- 
ysis to the case of more complicated light travel times, 
as well as to situations where there is relative motion 
between the spacecraft (i.e., Li ^ L'^). 

If we denote the discrete time stamps hy = a At 
and the value of data stream x{t) at t — by 

x[a] = x{ta) , (36) 

then the values of the six LISA data streams at time 
stamp a = 1 become 





= P2[-2] 






+ hi[l], 


(37) 




= P3[-l] 






+ K[i], 


(38) 




= P3[0]- 


P2[l] + 


n2[l] + 


h2[l]. 


(39) 




- Pi[-2] 


-P2[l] 


+ 4[1] 


+ h'2[l], 


(40) 




= 




+ "3[1] 




(41) 




= P2[0]- 


P3[l] + 


n^[l] + 




(42) 



where we used Eq. H35f) to explicitly evaluate the argu- 
ments of the time-delayed laser frequency noise. To ob- 
tain expressions for the data streams evaluated at other 
time stamps, we simply increment or decrement the ar- 
guments of the data streams by the appropriate number 
of sampling periods. 

To simplify the calculation further, we will only con- 
sider data streams having time stamps a = 1,2,3,4,5. 
Since there are six data streams in total, this corresponds 
to a 30-dimensional vector space of data points. The 
noise covariance C is thus a 30 x 30 matrix, whose a/3th 
element is itself a 6 x 6 matrix: 
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As the matrix Cap depends only on the difference be- 
tween a and /3, (i.e., Cn = C22, C12 = C23, etc.), the full 
covariance matrix C is block Toeplitz (i.e., it is constant 
along the a-f3 diagonals). Since C is also symmetric, we 
need only calculate Cn, C12, C13, C14, and C15 to fully 
determine C. 

Finally, we assume that the laser frequency noise pi [a] 
and individual noise terms ^^[q;], ^^[q;] are Gaussian- 
distributed with zero-mean and variances 

{n,[a]n,m = {n[[a]n'^m = SapS.,,al , (44) 
{Pi[a]pj[P]) = SapSijap , (45) 

I 



(i.e., the random processes are white), and that they are 
mutually uncorrelated, so that 



(n. [a^ [/?]) = (n, [a]p, = {n[ [a]p, [/?])= . (46) 



Given these assumptions, it follows that 



(47) 
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(48) 



(49) 



(50) 



(51) 



In terms of these sub-matrices, the full covariance matrix 
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and is shown schematically in Fig. |3| 
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FIG. 3: Schematic representation of the covariance matrix, 
C, for the simple LISA model. Blocks of increasing grey rep- 
resent values of 2(Tp + al, (jp, and — cTp. 



It is now an exercise in linear algebra to compute the 
eigenvectors and eigenvalues of C. We use MAPLE ^ 
to do this calculation for us. There are 22 distinct eigen- 
values, the smallest of which is independent of the laser 
frequency noise. This minimal eigenvalue 



(53) 
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is 9-fold degenerate. The nine eigenvectors correspond- 
ing to Amin orthogonally span a 9-dimensional vector sub- 
space of the 30-dimensional vector space of data points, 
elements of which do not depend on the laser frequency 
noise (see Tabled). Thus, these eigencombinations corre- 
spond to the TDI-like variables for this particular exam- 
ple. In principle one could compute the covariance ma- 
trix for the entire set of LISA data 10^ data points), 
though in practice this should not be necessary. In our 
example we need only four time stamps to fully charac- 
terise the covariance matrix, and a long time sequence of 
data from the example can be generated using the same 
set of eigencombinations of Doppler measurements. We 
will refer to the time series generated by these combina- 
tions as an eigenstreams. One can show e.g., that the 
Sagnac combination for spacecraft 1 (denoted a{t) in 0) 
is a linear combination of eigenvectors: —64 — ey + eg. 
Explicitly, 

a[5] ^ s[[5] + 4 [3] + 4 [2] - s,[5] - s^M - s^ll] , (54) 
which is just the discretised- version of 

a{t) = [s[{t) + W + 25i2?2s'2 W] 

- [si(t) + V^s^it) + PiI?3S3(i)] (55) 
= Wiit) + 4it - ^2) + s'S - ii - L2)] 

- \si{t) + S2(t - + S3(t - Li - L3)] , (56) 

(cf. Eq. (42) in Ref. 7J) appropriate for the arm lengths 
given in Eq. 135() . 

IV. DISCUSSION 

Although the LISA model considered above is greatly 
simplified, only considering fixed arm-lengths related by 
small integers and a basic noise model, we believe that 
the principal component approach is suitable for more 
sophisticated LISA models. In particular, we have indi- 
cated that the likelihood is the natural generating func- 
tion for orthogonal LISA data streams that are not domi- 
nated by laser frequency noise. This reduced set of eigen- 
streams is a sufficient basis (and possibly more than suf- 
ficient) to carry out all possible astronomy with LISA. 
The remaining eigenstreams contain information on the 
laser stability and are therefore important for instrumen- 
tal diagnostics rather than astrophysics. 

As we have emphasised, the frequency-noise- free eigen- 
streams are directly equivalent to TDI variables, but 
emerge naturally as a direct consequence of the likelihood 
analysis and so have a meaning that is directly relevant 



to subsequent data analysis. For example, if we take a 
model M of how an astrophysical source, or a number of 
sources, would appear in the LISA data and assume this 
model depends on a set of parameters a (such as sky po- 
sition, polarisation angle, etc.), then the joint posterior 
probability of these parameters is simply 

p(a|{ei},Af) ap(a|M) J]p(e,|a,M) (57) 

i 

where ei are the orthogonal frequency-noise-free eigen- 
streams generated from the data. In this way the LISA 
data analysis problem is cast in the powerful frame- 
work of classic inference, suitable for attack by stan- 
dard search and exploration algorithms including Markov 
Chain Monte Carlo methods (e.g. |l.ll9ll20j). 

More complex models of the LISA spacecraft will of 
course increase the complexity and size of the covariance 
matrix, but the basic principle will remain the same. For 
example, differing laser shot noise in the arms will break 
the eigenvalue degeneracy in the above example. In ad- 
dition, the lengths Li and L\ cannot be assumed equal 
when there is relative spacecraft motion and as a result 
we expect the larger covariance matrix to yield 2nd gen- 
eration TDI variables from its eigenstreams. 

The eigenstreams are defined in terms of their minimal 
variance rather than the amount of astronomy they con- 
tain, and there is no reason why they should all contain 
astronomical information. The eigenstreams, or combi- 
nations of eigenstreams, that are devoid of astronomi- 
cal data are 'zero-signal solutions' and can help in 
instrument diagnostics in circumstances when the astro- 
nomical signals would dominate (such as in the low fre- 
quency confusion limit of LISA) . 

We are reminded here of the importance of careful data 
acquisition and sampling for LISA. It is well-understood 
that only identically represented samples of the laser 
frequency noise will cancel effectively, and there could 
be times when the covariance between Doppler channels 
from this noise is reduced. This would be encoded as an 
increase in the baseline-dependent noises terms, and 
n^, and in the limit as approached tr^ the eigenvalues 
of the covariance matrix would cease to break into two 
groups. 

Finally, although PCA is often used for data compres- 
sion and, as we have shown, the TDI-like eigenstreams 
contain all the 'good quality' astronomical data from 
LISA, we are not proposing that these be generated on 
the spacecraft and relayed back to Earth. The saving in 
telemetry bandwidth would be clear, but having the raw 
Doppler data on Earth would greatly enhance the flexi- 
bility of the analysis and the robustness of the mission. 
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